As language models have grown in parameters and layers, it has become much harder to train and infer with them on single GPUs. This is severely restricting the availability of large language models such as GPT-3, BERT-Large, and many others. A common technique to solve this problem is pruning the network architecture by removing transformer heads, fully-connected weights, and other modules. The main challenge is to discern the important parameters from the less important ones. Our goal is to find strong metrics for identifying such parameters. We thus propose two strategies: Cam-Cut based on the GradCAM interpretations, and Smooth-Cut based on the SmoothGrad, for calculating the importance scores. Through this work, we show that our scoring functions are able to assign more relevant task-based scores to the network parameters, and thus both our pruning approaches significantly outperform the standard weight and gradient-based strategies, especially at higher compression ratios in BERT-based models. We also analyze our pruning masks and find them to be significantly different from the ones obtained using standard metrics.
translated by 谷歌翻译
Relu完全连接的网络普遍存在但无法诠释,因为它们适用于多层结构的分段线性函数和模型重量的复杂相互作用。本文采用了一种新的方法来通过在各个件(零件)上的设定操作来实现分段。这是通过近似规范正常形式并使用所得到的模型来完成的。这提供了特殊的优点(a)对拟合功能的参数的强对应关系(高可解释性); (b)能够符合连续功能的任何组合作为分段功能(易于设计); (c)在域的目标区域(有针对性学习)中添加新的非线性的能力; (d)避免分层的等式的简单性。它也可以在分段线性函数的总体Max-min表示中表达,这具有理论上的缓解和可信度。在模拟的回归和分类任务和基准数据集上测试了该架构,包括UCI数据集,MNIST,FMNIST和CIFAR 10。此性能与完全连接的架构相同。它可以找到各种应用,其中必须由可解释层替换完全连接的图层。
translated by 谷歌翻译
在本文草案中,我们考虑了安全控制系统安全索引或(控制屏障函数(松散))相对程度等于两个的问题的问题。我们考虑参数仿射非线性动态系统,并假设参数不确定性是统一的,并且已知A-Priori或通过估算器/参数适应定律在线更新。在这种不确定性下,通常的CBF-QP安全控制方法采用了强大的优化问题的形式。不等式约束的右侧和左侧都取决于未知参数。通过给定的不确定性表示,CBF-QP安全控制最终是凸半无限问题的问题。使用两种不同的哲学,一种基于弱二元性,另一个基于无损S生产的哲学,我们得出了此强大的CBF-QP问题的相同的SDP公式。因此,我们表明,可以将具有已知参数不确定性的安全控制的问题提出为可处理的凸问题并在线解决。 (这是正在进行的工作)。
translated by 谷歌翻译
英语水平评估已成为过滤和选择学术界和工业的预期候选人的必要度量。随着这种评估需求的增加,越来越必要拥有自动化的人类可意识的结果,以防止不一致并确保对第二语言学习者有意义的反馈。基于特征的经典方法在理解得分模型学习的内容方面更具可解释。因此,在这项工作中,我们利用古典机器学习模型作为分类和回归问题的语音评分任务,其次是彻底的研究来解释和研究语言线索与扬声器的英语水平之间的关系。首先,我们提取五个类别(流利,发音,内容,语法和词汇和声学)的语言学家特征,并列车模型到级响应。相比之下,我们发现基于回归的模型相当于或更好地比分类方法更好。其次,我们进行消融研究以了解每个特征和特征类别对熟练分级性能的影响。此外,要了解个别特征贡献,我们展示了顶部特征对分级任务的最佳执行算法的重要性。第三,我们利用部分依赖性地块和福芙值来探索特征重要性,并得出结论,最好的培训模式了解用于分级本研究中使用的数据集的底层尺寸。
translated by 谷歌翻译
在本文中,我们介绍了一个用于音频和语音的协作和现代注释工具:奥迪诺。该工具允许注释器在Audios中定义和描述时间分段。可以使用动态生成的形式轻松标记这些段和转录。管理员可以通过管理仪表板集中控制用户角色和项目分配。仪表板还可以描述标签及其值。可以轻松地以JSON格式导出注释以进行进一步分析。该工具允许通过基于键的API来上载和分配给用户的音频数据及其相应的注释。注释工具中可用的灵活性使注释进行演讲评分,语音活动检测(VAD),扬声器沿和扬声器识别,语音识别,情感识别任务等等。麻省理工学院开源许可证允许它用于学术和商业项目。
translated by 谷歌翻译
在这项研究中,我们提出了一种新的多模态端到端神经网络,用于使用注意融合自动评估非母语英语扬声器的自发言论。管道采用双向反复化卷积神经网络和双向长短期记忆神经网络,分别从谱图和转录中编码声学和词汇线索。对这些学习的预测特征进行注意融合,以在最终得分之前学习不同方式之间的复杂相互作用。我们将模型与强型基线进行比较,并发现对词汇和声学线索的综合关注显着提高了系统的整体性能。此外,我们对我们的模型提供了一种定性和定量分析。
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译
Object movement identification is one of the most researched problems in the field of computer vision. In this task, we try to classify a pixel as foreground or background. Even though numerous traditional machine learning and deep learning methods already exist for this problem, the two major issues with most of them are the need for large amounts of ground truth data and their inferior performance on unseen videos. Since every pixel of every frame has to be labeled, acquiring large amounts of data for these techniques gets rather expensive. Recently, Zhao et al. [1] proposed one of a kind Arithmetic Distribution Neural Network (ADNN) for universal background subtraction which utilizes probability information from the histogram of temporal pixels and achieves promising results. Building onto this work, we developed an intelligent video surveillance system that uses ADNN architecture for motion detection, trims the video with parts only containing motion, and performs anomaly detection on the trimmed video.
translated by 谷歌翻译
The machine translation mechanism translates texts automatically between different natural languages, and Neural Machine Translation (NMT) has gained attention for its rational context analysis and fluent translation accuracy. However, processing low-resource languages that lack relevant training attributes like supervised data is a current challenge for Natural Language Processing (NLP). We incorporated a technique known Active Learning with the NMT toolkit Joey NMT to reach sufficient accuracy and robust predictions of low-resource language translation. With active learning, a semi-supervised machine learning strategy, the training algorithm determines which unlabeled data would be the most beneficial for obtaining labels using selected query techniques. We implemented two model-driven acquisition functions for selecting the samples to be validated. This work uses transformer-based NMT systems; baseline model (BM), fully trained model (FTM) , active learning least confidence based model (ALLCM), and active learning margin sampling based model (ALMSM) when translating English to Hindi. The Bilingual Evaluation Understudy (BLEU) metric has been used to evaluate system results. The BLEU scores of BM, FTM, ALLCM and ALMSM systems are 16.26, 22.56 , 24.54, and 24.20, respectively. The findings in this paper demonstrate that active learning techniques helps the model to converge early and improve the overall quality of the translation system.
translated by 谷歌翻译
We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.
translated by 谷歌翻译